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Abstract 

The situation frequently arises where working with the UkeUhood 

frinction is problematic. This can happen for several reasons perhaps 
the likelihood is prohibitively computationally expensive, perhaps it 
lacks some robustness property, or perhaps it is simply not known 
for the model under consideration. In these cases, it is often possible 
to specify alternative functions of the parameters and the data that 
can be maximized to obtain asymptotically normal estimates. How- 
ever, these scenarios present obvious problems if one is interested in 
applying Bayesian techniques. Here we describe open-faced sandwich 
adjustment, a way to incorporate a wide class of non-likelihood ob- 
jective functions within Bayesian-like models to obtain asymptotically 
valid parameter estimates and inference via MCMC. Two simulation 
examples show that the method provides accurate frequentist uncer- 
tainty estimates. The open-faced sandwich adjustment is applied to a 
Poisson spatio-temporal model to analyze an ornithology dataset from 
the citizen science initiative eBird. 

1 Introduction 

For many models arising in various fields of statistical analysis, working 
with the likelihood function can be undesirable. This may be the case for 
several reasons — perhaps the likelihood is prohibitively expensive to com- 
pute, perhaps it presumes knowledge of a component of the model that 
one is unwilling to specify, or perhaps its form is not even known for a 
chosen probability model. Such scenarios present problems if one wishes 
to perform Bayesian analysis. Applying the Bayesian computational and 
inferential machinery, thereby enjoying benefits such as natural shrinkage, 



variance propagation, and the ability to incorporate complex hierarchical 
dependences, usually requires working directly with the likelihood function. 

To motivate the development, we briefly describe the analysis of bird 
sightings contained in Section |5.1[ The data consist of several thousand 



counts occurring irregularly in space and time (see Figure [T]) , along with 
several spatially-varying covariates carefully chosen by a group of ornithol- 
ogists. A natural model for such data is a hierarchical Poisson regression 
with a random effect specified as a spatio-temporal Gaussian process with 
unknown covariance parameters. Here, whetever one's philosophical orien- 
tation, Bayesian methods are most practical to implement, and in addition 
provide sharing of information across space and time, as well as automatic 
uncertainty estimation of predictive abundance maps. Furthermore, ob- 
taining an MCMC sample of the posterior distribution is desirable because 
inferences on the posterior correlation surface of the random effect, a nonlin- 
ear functional of random covariance parameters, is of independent interest 
to the ornithologists. However, the sheer size of the dataset makes MCMC 
under this model intractable, so a faster objective function is used in place of 
the high-dimensional Gaussian likelihood. The goal of the method presented 
here is to enable such a substitution while retaining a valid interpretation 
of the resultant MCMC sample. 
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Figure 1: Spatial locations of Northern Cardinal observations. 
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More generally, suppose that one specifies a model, either one-stage or 
hierarchical, and wants the advantages of being Bayesian, but the likelihood 
in some level of the hierarchy is problematic. Suppose, however, that one 
can write down some objective function £m{G', y) of the parameters and the 
data (possibly conditional on other parameters) that behaves similarly to 
the log likelihood. We will define what we mean by "similarly" in Section 
[2] Important examples of methods that employ such objective functions 
include generalized estimating equations ( Hardin and Hilbe , 2003 1 , general- 



9 i^zed metho d of moments ( Hall , 2005 ) , and robust M-estimation ( Huber and 



Ronchetti, 2009), as well as the two examples we will consider here, covari- 



ance tapering (Kaufman et al. 2008) and composite likelihoods (Lindsay 



1988). 



13 The question we attempt to answer here is this: Can we insert £m{(^', y), 

14 in place of the likelihood, into an MCMC algorithm like Metropolis-Hastings 

15 and "trick" it into doing something useful? We claim that we can — that for 

16 many useful examples, simply swapping ^M(0;y) into a sampler results in 

17 a ^uasi-posterior sample that can be rotated and scaled to yield desirable 

18 properties. 

19 The OFS adjustment relies on asymptotic theory that was formally devel- 

20 oped in Chernozhukov and Hong (2003j), but which is quite intuitive. These 

21 authors were interested in using Metropolis-Hastings as an optimization al- 

22 gorithm for badly-behaved objective functions, not in using non-likelihood 

23 objective functions for performing Bayesian-like analysis, as we are here. 

24 Although their goals were entirely different, the theory contained therein is 

25 extremely useful for our purposes. 

26 Previous attempts to incorporate non-likelihood objective functions into 

27 the Bayesian setting, to our knowledge, have been few. McVean et al. (2004) 

28 use composite likelihoods within reversible jump MCMC, without any ad- 

29 justment, to estimate population genetic parameters. Realizing that their 



McVean et al. 


( 


2004) turn to a 


variability. Smith and Stephen- 



son 



( 2009 ) were interested in max-stable processes for spatial extreme value 
analysis. They also use composite likelihoods within MCMC without ad- 
justment. The special case of using the generalized method of moments 



35 objective function (Hansen, 1982 Hall 



within an MCMC sampler was explored by Yin ( 2009 ) 
is 



2005|) for generalized linear models 
Tangentially related 



Tian et al. ( 2007 ) , who use MCMC to estimate the sampling distribution 



38 of = argmax£M(^; y)- 



Cooley et al. (2012) attempt to solve the same problem that we address 



here. Whereas we adjust quasi-posterior samples generated from MCMC 
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1 post hoc, these authors propose an adjustment to the Metropolis hkehhood 

2 ratio within the sampler itself. Their goal, like ours, is to achieve desir- 

3 able frequentist coverage properties of credible intervals computed based on 

4 MCMC. Although their approach is quite general, Cooley et al. (2012) re- 

5 strict their attention to using composite likelihoods for max-stable processes. 



The approach taken in the present article is closely related to that of Cooley 



et al. (2012), but the OFS adjustment differs from their adjustment in its 



structure as well as its motivating asymptotic arguments. 

Both the motivating insights for the OFS adjustment and the criterion 
by which we evaluate it is essentially the idea of calibration ( Draper , 2006 ) . 
In our interpretation, a well-calibrated method has the property that when 
used to construct credible intervals from many different datasets, those in- 
tervals ought to cover the true parameter at close to their nominal rates. 
Essentially, this says that well-calibrated credible intervals behave like con- 
fidence intervals. If we construct intervals with accurate coverage directly as 
the a/2 and (1 — a/2) empirical quantiles of an MCMC sample for different 
values of a, we claim that in some way our uncertainty about a parameter is 
well-described by the sample. Evaluating an approximate Bayesian method 
by this criterion has intuitive practical appeal, and it has been endorsed in 



particular by objective Bayesians (Bayarri and Berger, 2004 Berger et al. 
e.g.). 



2001 



This principle, along with some basic asymptotic observations, leads to 
the OFS adjustment. The asymptotic theory gives us the limiting normal 
distribution of quasi-Bayes point estimators. We take this distribution, in 
an informal sense, to be a summary of our uncertainty about 6, up to 
an asymptotic approximation. The asymptotic theory also gives us the 
limiting normal distribution of the quasi-posterior. Since these two limiting 
distributions are not, in general, the same, and since we would like the 
quasi-posterior to summarize our uncertainty about 6 in the sense of being 
well-calibrated, our strategy is to adjust samples from the quasi-posterior 
so that their limiting distribution matches that of the quasi-Bayesian point 
estimator. 

We note the temptation to ask how well the adjusted quasi-posterior 
distribution approximates the true posterior distribution, in cases when the 
true likelihood is available. However, this is the incorrect comparison to 
make. The true posterior distribution contains the information about 6 
obtained through the likelihood. When some other function iM is used in 
place of the likelihood, there is no reason to expect the information content 
to remain the same. We would like the adjusted quasi-posterior distribution 
to represent this loss of information, not hide it. In our simulation examples 
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1 in Section |4j the frequentist accuracy of credible intervals based on adjusted 

2 quasi-posterior samples shows that the OFS adjustment accomplishes this 

3 task. 

4 Throughout, it will be assumed that expectations will be computed with 

5 respect to the true parameter 9q. We define the square root of a symmetric 

6 positive definite matrix A to be A^/^ = OD^/^O', where A = ODO' with 

7 O orthogonal and D diagonal. The square root of a matrix is not unique; 

8 here we compute A^/^ using the singular value decomposition, which is 

9 numerically stable and preserves key geometric attributes. 

10 We begin in Section [2] by defining the quasi-Bayesian framework and 

11 reviewing the relevant asymptotic theory. In Section [3] we develop the OFS 

12 adjustment method, and we demonstrate how to apply it in two different 

13 statistical contexts in Section |4j In Section [5] we apply the OFS adjustment 

14 to analyze a dataset of Northern Cardinal sightings taken from the citizen 

15 science project eBird. Section [6] concludes. 



16 2 The quasi-Bayesian framework 

17 We begin by assuming that the parameter of interest 6 lies in the interior 

18 of a compact convex space 0. Suppose we are given y, which consists of 

19 n observations, from which we wish to estimate 6. Suppose further that 

20 we have at our disposal some objective function ^Af(^;y) from which it is 

21 possible to compute 6m = argmax0 iM{d] y)- 

22 Following Chernozhukov and Hong ( |2003 ), we define the quasi-posterior 



distribution based on n observations as 

where LM,n{6; Yn) = exp {lM,n{6; yn)}, and vr(0) is a prior density on 0. We 
will assume, for convenience, that vr(0) is proper with support on 0. The 
function Lm^u is not necessarily a density, and thus T^M,n{'9\yn) is not a true 
posterior density in any probabilistic sense. We will assume, however, that 
LM,n is integrable, so as long as the prior vr(0) is proper, it easily follows 
that i^M,n{6\yn) will be a proper density. 

Equipped with notion of a quasi-posterior density, we can define quasi- 
posterior risk as Rnid) = J^Pnid — ^*)7rM,n(^*|yn) d0*, where /On(u) is 
some convex scalar loss function. For simplicity, we assume that Pri(u) is 
symmetric, although this assumption may be dropped. Then for a given 
loss function, the quasi-Bayes estimator is naturally defined as 0qb = 
argmin^gQ Rn{6), the value of that minimizes quasi-posterior risk. 
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Our requirements on iM,n{d] Yn) sue fairly minimal and are met by most 
objective functions in wide use in statistics. Technical assumptions are con- 
tained in Chernozhukov and Hong (2003), but they are in general satisfied 
when Om is weakly consistent for Oq and asymptotically normal. 

Asymptotic normality of 6m is of the form 



Mm 



0o) ^iV(0,I) 



(2) 



where 



— Eo[^O^M,i 



(3) 



The notation Vq/ refers to the gradient of the function / evaluated at the 
true parameter 6q, and HqJ refers to the Hessian of / evaluated at Oq. These 
matrices have been defined in terms of partial derivatives, but in general, 
^M,n does not have to be differentiable or even continuous for the theory to 
apply. In this case, small adjustments of the definitions of P^^ and f^re 
necessary. 

The sandwich matrix J^^ is familiar from generalized estimating equa- 
tions, quasi-likelihood, and other areas, and is referred to by various names, 
including the Godambe information criterion and the robust information 
criterion (e.g. Durbin 1960 Bhapkar, 1972 Morton, 1981: Ferreira, 1982 



Godambe and Heyde 



1987 



Heyde, 1997). We note that in the special case 

17 when £M,n(^; y) is the true likelihood, Qn = J„, the Fisher information. We 

18 will hereafter assume that this is not the case. 



19 2.1 Review of relevant asymptotic theory 



Chernozhukov and Hong (2003) elucidates the asymptotic behavior of TrM,n{(^\yn), 

21 which motivates the open- face sandwich adjustment. These results are di- 

22 rect analogues of well-known asymptotic properties of true posterior distri- 

23 butions. Their Theorem 2, which we re-state below, states that the asymp- 

24 totic distribution of the quasi-Bayes estimator 0QB,n is the same as that of 

25 the extremum estimator 0M,n- 

26 Theorem 1 Assuming sufficient regularity of iM,n{^',yn), 

j'J\eQB-eo)^Nio,i). 
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Theorem [T] above is the quasi-posterior extension of the well-known result 
that, under fairly general conditions, Bayesian point estimates have the same 
asymptotic distribution as maximum likelihood estimates. 



Theorem 1 of Chernozhukov and Hong (2003), which we re-state here 



in a slightly different form, is a kind of quasi-Bayesian consistency result, 
showing that quasi-posterior mass accumulates at the true parameter 9q. 

Theorem 2 Under the same conditions as Theorem{l\ 

{(^\yn)\\TV — > 0, 

where \\ ■ \\ tv indicates the total variation norm, and iTM,ooid\yn) is a normal 
density with random mean + Qn ^^•^Af,n(^o) o^f^d covariance matrix Q^^. 

Theorem [2] may be arrived at informally via a simple Taylor series argument. 
It is therefore intuitive that the quasi-posterior converges to limiting normal 
distribution whose covariance matrix is defined by the second derivatives of 

The key observation is that the limiting quasi-posterior distribution has 
a different covariance matrix than the asymptotic sampling distribution of 
the quasi-Bayes point estimate. The consequence is that the usual Bayesian 
method of constructing credible intervals based on quantiles of the quasi- 
posterior sample will, viewed as confidence intervals, not have their nominal 



19 ifrequentist coverage probabilities. Fortunately, thanks to Chernozhukov and 



Hong ( 2003 ) , we know what those two asymptotic covariance matrices look 



21 like, which suggests a way to "fix" 7rM,n(^|yn)- 



22 3 The open-faced sandwich adjustment 

23 Let us assume that we have a sample of draws from TTM,n{d\yn), generated 

24 by replacing the likelihood with ^Af(^;y) in some MCMC sampler such as 

25 Metropolis-Hastings. Our aim here is to adjust the quasi-posterior draws 

26 such that the adjusted sample realistically reflects how the data informs our 

27 uncertainty about the parameter of interest 6 through the function Im{^] y)- 

28 Were that the case, the usual credible intervals constructed from empirical 

29 quantiles of the adjusted sample would have close to nominal coverage. We 

30 will accomplish this by constructing a matrix f2„ that, when applied to 

31 the (centered) quasi-posterior sample, will rotate and scale the points in an 

32 appropriate way. 
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1 We have observed that whereas the asymptotic covariance matrix of j[/f ^ 

2 is the sandwich matrix J^^, the asymptotic covariance matrix of the quasi- 

3 posterior distribution is a sing le "shce of bread" Q-^ What we want to 

4 do then is complete the sandwich by joining the sHce of bread to the 

5 open-faced sandwich P„Q^^ to get J~^. 

6 We define n„ = Q-^pI/^Q}/'^ , the open-faced sandwich adjustment ma- 

7 trix. One can easily check that if Z„ ~ A^(0, Q^^), then fi„Z„ ~ A^(0, J^^)- 

8 The idea then is to take samples from '7rM(^|y) obtained via MCMC and 

9 pre-multiply them (after centering) by an estimator of to "correct" the 

10 quasi-posterior sample. That is, if 6^^\ . . . , 6^"^^ is a sample from vrM(^|y)) 

11 then for each j = 1, . . . , J, 

^OFS = ^QB + ^{0^'^ - Oqb) (4) 

12 is the open-face sandwich adjusted sample. It is clear that a consistent 

13 estimator of fi will generate credible intervals that are consistent (1 — a) 

14 confidence intervals. 



15 3.1 Estimating fl 

16 The OFS adjustment Q requires an estimate of the matrix CI, which in 

17 turn requires estimates of P and Q. Because the OFS adjustment occurs 

18 post-hoc, it is possible to leverage the existing MCMC sample to compute 

19 ft. There are many possible approaches to this task, and here we offer some 

20 suggestions, which we summarize in Table [T} 

21 _ While is P is notoriously difficult to estimate well (see Kauermann and 



Carroll 2001, for some examples), Theorem [T] immediately suggests a way 



to estimate Q directly from the MCMC sample with almost no additional 
computational cost. Specifically, noting that the quasi-posterior density con- 
verges to a normal with covariance matrix Q~^, a natural estimate Qj^^ is 
just the sample covariance matrix of the MCMC sample. Another possibil- 
ity that requires almost no additional computation is to retain the results of 
the evaluations of Im at each iteration of the sampler and use them to nu- 
merically estimate the Hessian matrix at 0qb- This Hessian approximation 
will generally be a good estimator Qn of Q. 

These estimators of Q are not only simple to compute, but they arise as 
direct results of MCMC output, requiring no additional analytical deriva- 
tions based on Im- They are, in this sense, "model-blind." Unfortunately, 
we are unaware of any such "model-blind" estimators of P. The simplest 
solution, in the case where we can write an expression for V£m{0', y) and the 
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data y consists of n independent replicates, is to compute a basic moment 
estimator 

1 " 

Pi = -Y.Vhi{eQB;yi)Vhi{eQB;yiy, (5) 
1=1 

which is consistent as n — t- oo under standard regularity conditions. We use 
equation ([s]) in Section and 4.2 , where we have replication. However, because 



V^A/(0QB;y) converges to zero, when we only observe a single realization 
of a stochastic process, as in Section 4.1, equation ([s]) fails to provide a 



viable estimator. In this latter example, analytical expressions for P(0) are 
available. Plugging 0qb into the analytical asymptotic expression gives an 
estimator Pn. If a corresponding analytical expression exists for Q, we call 
the corresponding plug-in estimator Qm 

When an expression for P{0) is unavailable, but when it is possible to 
simulate the process that generated y, the parametric bootstrap is an attrac- 
tive option. Let y i , . . . , yx be K independent realizations of the stochastic 
process generated under 0qb- Then 



1 ^ 

Phnot = ^ J^V^Af(0QB;yfc)V4f(6>QB;yfc)' (6) 



boot 

k=l 



is the parametric bootstrap estimator of P (an analogous estimator could, 
of course, be used for Q). A nice feature of ^ and ^ is that, at the ex- 
pense of (perhaps considerable) computational effort, one could substitute 
finite-difference approximations to the required gradients to obtain reason- 
able estimators, even in the absence of available closed-form expressions for 

Estimator Description 

Qj"^ sample covariance of MCMC sample 

Qii Hessian of ^m(^qb) 

Qui plug 9qb into asymptotic formula 

Pi moment estimator based on score vector 

Pii plug 9qb into asymptotic formula 

Pboot parametric bootstrap 

Table 1: Summary of estimators of sandwich components. 
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1 3.2 The curvature adjustment 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 



We now describe the curvature-adjusted sampler of Cooley et al. (2012). 



22 
23 
24 
25 
26 
27 
28 



This sampler was presented as a way to include composite likelihoods in 
Bayesian-like models but in fact has far wider generality. Composite likeli- 



hoods (Lindsay, 1988) are functions of and y constructed as the product 

In effect, composite likelihoods 
Under fairly general 



of joint densities of subsets of the data, 
treat these subsets as though they were independent, 
regularity conditions, the asymptotic distribution of maximum composite 
likelihood estimators have sandwich form ^ (Lindsay, 1988). Although 
Cooley et al. (2012) consider only composite likelihoods, their argument 



holds equally well for any function ^m(^; u) with sandwich asymptotics. 

The curvature- adjusted sampler begins by computing the extremum es- 
timator 6m and as a preliminary step. It works by modifying the 
Metropolis-Hastings algorithm using a transformation of the form (Q such 
that the acceptance ratio has a desirable asymptotic distribution. Specifi- 
cally, at each iteration j = 1, . . . , J, the algorithm proposes a new value 6* 
from some density q{-\0^''^) and compares it to the current state 6^^^ to eval- 
uate whether to accept or reject the proposal. The difference between the 
curvature-adjusted sampler and the traditional Metropolis-Hastings sam- 
pler is that this comparison now takes place after 6* and 6^^^ are scaled and 
rotated. That is, 9* is accepted with probability 



min < 1 , 



LM(0gi;y)vr(0(^-))g(r|0(^-)) 



(7) 



where Oq^ = 6 m + ^{0 m){9* — m) , and analogously for o'^^j^- [Cooley et al. 
(2012) use a result from Kent (1982) to show that the ratio in (1 71) has the 



same asymptotic distribution as that of the true likelihood ratio, and argue 
that the resultant sample has an asymptotic stationary distribution that is 
normal with covariance J~^, as desired. Note that unlike the OFS adjust- 
ment, the curvature-adjusted sampler requires outside initial estimates of 6 
and ri because the adjustment occurs within the sampling algorithm. 



29 3.3 OFS within a Gibbs sampler 

30 With some care, the OFS adjustment may be applied in the Gibbs sampler 

31 setting. Suppose we divide 9 into B blocks such that Oi, ... ,6b forms a par- 

32 tition of 0, and we wish to draw from the quasi- full conditional distribution 

33 with density /(0j|0_j,y) oc LM{y\d)f{6i), where 0-i refers to the elements 

34 of 9 not contained in Oi. Then the adjustment matrix Qg.^g . is defined as 
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before, only now it applies only to Oi and is conditional on If all full 
conditional densities f{0i\6-i,y), i = 1,...,B, are quasi-full conditional 
densities in the sense that they are proportional to a product of ^^(yl^) 
and another density, the Gibbs sampler may be run by successively drawing 
from y), i = 1,. . .B, and the OFS adjustment may proceed post 

hoc as before. This can be seen by viewing the Gibbs sampler as a special 

case of Metropolis-Hastings (see Robert and Casella, 2004 Section 7.1.4). 

(7) 

Now suppose that at iteration j of a Gibbs sampler we have drawn Of 

from the quasi-full conditional f{0i\6^^l,y). Suppose further that f{Oi+l\0^f|^^_^_l^ 
is not a function of Lm, as will be the case for many parameters in hier- 
archical models that contain Lm- Since y) is a true full 



conditional density and not a quasi-full conditional density, there is no OFS 

i) 

-(i+i)' 



adjustment to make. But clearly -|^s,y) depends on 6^-'\ and 



(7) 

as a result, plugging in un- adjusted samples of 6- will not result in the 

desired stationary distribution. It is clear then that to achieve proper vari- 

(7) 

ance propagation through the model, we must adjust Of before plugging 

it into y). Therefore, the OFS adjustment within the Gibbs 

sampler may not be applied post hoc, but rather must occur within the 
sampling algorithm. 

Embedding OFS adjustments within Gibbs samplers requires careful 
consideration of the conditional OFS matrices Qq-^q ., i = 1,...,B, the 
adjustment matrices associated with the quasi-full conditional distributions 
f{6i\6^i), i = 1,...,B. Because it is defined conditionally on ide- 
ally each should be re-estimated at each iteration j based on the 

current value 0^]. We refer to ^g^^e^i the conditional OFS adjustment 
matrix for 6- at iteration j. Implementing the OFS Gibbs sampler us- 
ing these conditional OFS adjustments requires estimation of by 
one of the techniques described in Section [3. 1[ for each block of parameters 
i = 1,...,B with corresponding quasi-full conditional depending on Lm, 
and for each Gibbs iteration j in 1, . . . , J. Furthermore, each computation 

of . requires an estimate of as input, necessitating some sort of 

optimization or MCMC, again nested within each block i = 1, . . . ,B and 
each iteration j in 1 , . . . , J. This is an enormous computational burden. 

Instead, we make the simplifying assumption that ^g ^g does not change 
much from iteration to iteration. We instead use a constant (with respect to 
j) estimate ^g^g ., where is fixed at its marginal quasi-Bayes estimate. 

(7) 

We refer to ^g^g as the marginal OFS adjustment matrix for 0- . 
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1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



Despite this simplification, the OFS-adjusted Gibbs sampler still requires 
additional work relative to an an- adjusted sampler because the algorithm 
requires 0qb and ^g^g ., i = 1,..-,B, as input. In practice, then, we 
run the sampler twice. The first time, recalling that Theorem [2] says the 
un-adjusted quasi-posterior concentrates its mass at 6q, we make no OFS 
adjustments, and use the generated sample to produce Oqb- We next use 
^QB to produce i^g^g , i = 1,. ■ ■ ,B, using one of the methods described 
in Section |3.1[ Finally, the Gibbs sampler is re-run, this time with the 
transformation defined by Q applied for each block i and each iteration j, 
i = 1, . . . , B, j = 1, . . . , J. Thus, the computational burden required to use 
marginal OFS adjustments is approximately twice that of the un-adjusted 
Gibbs sampler. In contrast, the additional computational burden required 
to use conditional OFS adjustments may range from several fold to several 
thousand fold, depending on the method used to estimate ft 



U) 



We have explored (informally) the effects of using the much more com- 
putationally efficient marginal adjustment instead of the conditional ad- 
justment and found only very minor differences in the resultant adjusted 



quasi-posteriors. (See Section 4.1 for an example.) The issue of conditional 



vs. marginal adjustments also appears in Cooley et al. (2012). They refer 



to using constant (in j) adjustment matrices as an "overall" Gibbs sampler 
and using conditional adjustment matrices as an "adaptive" Gibbs sampler. 



Corroborating our findings, Cooley et al. (2012), in a more systematic study 



using a very simple model, found very little difference between their overall 
and adaptive curvature-adjusted quasi-posteriors. 



4 Examples 



We now describe two examples of non-likelihood objective functions that 
have appeared in the literature. In each example, working with the likeli- 
hood is problematic for a different reason, and each fits into the OFS frame- 



29 work. In the first example, we apply covariance tapering (Furrer et al. , 2006 



Kaufman et al. , 2008; Shaby and Ruppert, 2012) to large spatial datasets. 



Here the likelihood requires the numerical inversion of a very large matrix. 
For large datasets, this inversion becomes prohibitively computationally- 
expensive, so the likelihood is replaced with its tapered version, which lever- 
ages sparse- matrix algorithms to speed up computations. In the second ex- 



35 ample, composite likelihoods for spatial max-stable processes (Smith, 1990 



Padoan et al. , 2010), a probability model is assumed, but the likelihood 



37 consists of a combinatorial explosion of terms, and is therefore completely 
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intractable for all but trivial situations. Hence, in this example, the likeli- 
hood function is simply not known. These examples may be considered toy 
models in that one could easily maximize their associated objective functions 
and compute sandwich matrices to obtain point estimates and asymptotic 
confidence intervals. We use these examples simply to illustrate the effec- 
tiveness of the OFS framework. 

For each example in this section, we conduct a simulation study to in- 
vestigate how well the OFS adjustment performs by measuring how often 
nominal (1 — a) credible intervals cover 9q. To do this, we draw datasets 
Yk, k = 1,...,1000, from the model determined by some fixed 6q. We 
then run a random walk Metropolis algorithm, with iMi^'iYk) inserted in 
place of a likelihood, on each of the 1000 datasets. Next, we use each set 
of MCMC samples to compute estimates 0QB,fc and fifc using different es- 
timators as discussed in Section 



3.1 



Finally, we use each 0QB,fc and Q,k to 
adjust their corresponding batch of MCMC output, and record the resultant 
equi-tailed (1 — a) credible interval for many values of a. In addition, we run 



the curvature- adjusted sampler of Cooley et al. (2012) for comparison. For 



each example, empirical coverage rates are plotted against nominal coverage 
probabilities. 



4.1 Tapered likelihood for spatial Gaussian processes 

The most common structure for modeling spatial association among obser- 



vations is the Gaussian process (Cressie, 1991 Stein, 1999). In addition to 



modeling Gaussian responses, the Gaussian process has been used exten- 
sively in hierarchical models to induce spatial correlation for a wide variety 



of response types (Banerjee et al. , 2004). 



Here we assume that Y{s) ~ GP(0, C{0); s), a mean-zero Gaussian pro- 
cess whose second-order stationary covariance is given by a parametric fam- 
ily of functions C indexed by 6, depending on locations s in some spatial 
domain D. We will further assume that the covariance between any two 
observations yi and Uj located at Sj and Sj is a function of only the distance 
Sj||. Then the likelihood for n observations from a single realization of 



Yis] 



IS 



77 1 

,(0;yO=--log(27r)--log(|5]„(0)|) 



1 



(8) 



where S,j_„(0) = C{9; ||sj — Sj 
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While conceptually simple, these Gaussian process models present com- 
putational difficulties when the number of observations of the Gaussian pro- 
cess becomes large, as the likelihood function ([s]) requires the inversion of a 
n X n matrix, which has computational cost O(n^). To mitigate this cost, 
Kaufman et al. (2008) proposed replacing ([s]) with the tapered likelihood 



function 

Tl 1 
kn{0; yn) = - 2 log(2^) - 2 log(l^n(^) ° 

-^y;((5]„(0)oT„)-ioT„)y„, (9) 

where the o notation denotes the element- wise product, and Tjj = /3t(||sj — 
Sj II), a compactly-supported correlation function that takes a non-zero value 
when ||sj — Sj|| is less than some pre-specified "taper range." The compact 
support of pt induces sparsity in T„, and hence all operations required to 
compute ([9]) may be computed using specialized sparse-matrix algorithms, 
which are much faster and more memory-efficient than their dense-matrix 
analogues. 

Under suitable conditions, the tapered likelihood satisfies asymptotics of 
the form ([2]), and Theorems [l] and [2] apply (Shaby and Ruppert, 2012). For 



the simulations, we take C{6; ||sj — Sj||) = a exp{—c/a ■ ||sj — Sj||}, with 
9 = (a'^^c)' = (1, 0.2)'. The observations are made on a 40 x 40 unit grid, so 
that each dataset y is a single 1600-dimensional realization of a stochastic 
process. Half-Cauchy priors were used for both parameters. 

For this example, analytical expressions for both P(0) and Q(0) are 
available (Shaby and Ruppert, 2012[ ). As described in Section 3.1 



we use 



the plug-in estimator Qk = Qiii(^qb,A:) ^Pii(^QB,fc)^''^Qiii(^QB,fc)^/^, as 
well as Clk = Qf ^Pii(^QB,fc)"^^^Qr^' ■^ith Qi^ computed directly from the 
MCMC sample, for each k = 1, . . . , 1000 simulated datasets. 

Figure [2] shows that the un-adjusted MCMC samples (dotted curves) 
yield horrible coverage properties for both and c. It is somewhat inter- 
esting that while the "naive" intervals severely under-cover o"^ , they severely 
over-cover c. We therefore see that a naive implementation results in being 
overly optimistic about estimates of o"^ while being overly pessimistic about 
estimates of c. The OFS-adjusted intervals display much more accurate 
coverage, achieving nearly nominal rates, although for c, the asymptotic ex- 
pression for Q seems to produce intervals that are systematically slightly 
too short. The curvature-adjusted sampler results in similar coverage. 

To explore how the marginal OFS adjustment differs from the condi- 
tional adjustment in the Gibbs sampler setting, we simulate data from a 
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Figure 2: Empirical coverage rates for equi-tailed credible intervals based 
on MCMC samples using the tapered likelihood. Blue and red curves are 
OFS-adjusted samples using different estimates of fi, green curves are from 
a curvature- adjusted sampler, and dotted curves are un-adjusted samples. 



spatial linear model, Y{s) ~ GP(X/3, C(0); s). We set /3 = (-0.5,0,0.5)' 
and use the same spatial design and covariance function as above. The de- 
sign matrix X is a 1600 x 3 matrix of standard normal deviates, and the 
prior distribution for /3 is a vague normal centered at zero. At each Gibbs 
iteration, 6 is updated using a Metropolis step using the tapered likelihood, 
and (3 is updated by drawing directly from its conditionally conjugate full 
conditional distribution. The sampler is first run without adjustment, and 
0QB and /3qb are computed as the quasi-posterior means. The marginal 
OFS adjustment matrix i s then computed using the analytical expression 



from Shaby and Ruppert (2012) by plugging in 0qb and /3qb- Next, the 
Gibbs sampler is run a second time using the estimated marginal OFS ma- 
trix to adjust the sample from the full conditional distribution of 9 at each 
iteration. Finally, the Gibbs sampler is run a third time, this time estimating 
the conditional OFS adjustment matrix at each iteration by maximizing the 
tapered likelihood function and plugging the resultant parameter estimates 
into the asymptotic formula for Vt. 

Because the conditional OFS-adjusted Gibbs sampler is so computa- 
tionally expensive, we simulate just a few datasets and report the output 
from one of them. Figures 3(a) and [3(b)] compare the marginal adjusted 
quasi-posterior distributions for the two covariance parameters, generated 
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with the marginal and conditional OFS adjustments. The qq-plot for cx^ 
shows almost perfect agreement except for a handful of MCMC samples on 
the upper tail. For the c parameter, the qq-plot shows that the marginal 
OFS adjustment produces a quasi-posterior that is the same shape as that 
of the conditional adjustment, but is slightly shifted to the right. Figure 



3(c) shows contours of a kernel density estimate of the joint marginal ofs- 



adjusted quasi-posterior distribution of the two covariance parameters, with 
the marginal adjustment in black and the conditional adjustment in gray. 
The contours are very similar, with some small differences appearing on 
the right half of the ci^-axis, indicating good agreement between the two 
bivariate distributions. The output from all the simulated datasets looked 
qualitatively similar, with no noticeable systematic differences between the 
two adjustments. The choice of adjustment had no discernible effect on the 
quasi-posterior distribution of /3. 



15 4.2 Composite likelihood for max-stable processes 

16 Statistical models for extreme values that include spatial dependence are 

17 useful for studying, for example, extreme weather events like heat waves 



and powerful storms (Cooley et al. , 2007, Sang and Gelfand, 2010, e.g.). 



Extreme value theory tells us that the distribution of block-wise maximum 
values (such as annual high temperatures) of independent draws from any 
distribution converges to a generalized extreme value (GEV) distribution 



(see Coles, 2001), if it converges at all. The asymptotics therefore suggest 
that any model of block- wise maxima at several spatial locations ought to 
have GEV marginal distributions with distribution function 



F{y;n,cr,^) = exp 



1 + C 



}■ 



where is a location, a a scale, and ^ a shape parameter that determines 
the thickness of the right tail. The GEV may be characterized by the max- 



27 stability property (Coles, 2001). More generally, block- wise maxima of ran- 



28 dom vectors must also converge to max-stable distributions. 



Sang and Gelfand ( 2010 ) achieve spatial dependence with GEV marginals 



30 through a Gaussian copula construction. However, Gaussian copula models 



31 have been strongly criticized ( Kliippelberg and Rootzen , 2006 ) because they 



do not result in max-stable finite-dimensional distributions, nor do they per- 
mit dependence in the most extreme values, referred to as tail dependence, 
and it is clear that physical phenomena of interest do exhibit strong spatial 
dependence even among the most extreme events. 
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An alternate approach for encoding spatial dependence of extreme values 



is through max-stable process models (de Haan, 1984), which are stochastic 



processes over some index set where all finite-dimensional distributions are 
max-stable. Explicit specifications of spatial max-stable processes based on 



5 the de Haan (1984) spectral representation have been proposed by Smith 



(1990), Schlather (2002), and Kabluchko et al. (2009). These formulations 



have the advantage that they do represent tail dependence. 

Unfortunately, for all of the available spatial max-stable process models, 
joint density functions of observations at three or more spatial locations are 
not known (a slight exception is the Gaussian extreme value process (GEVP) 



(Smith, 1990), for which Genton et al. (2011) derives trivariate densities). 



12 Since bivariate densities are known, Padoan et al. (2010 ) proposes parameter 



estimation and inference via the pairwise likelihood, where all bivariate log 
likelihoods are summed as though they were independent: 



The pairwise likelihood is a special case of a composite likelihood (Lindsay 
1988). Padoan et al. (2010) show that asymptotic normality of the form ([2j) 



applies, so we may again apply the OFS adjustment. 

Our simulation experiment consists of 1000 draws from a GEVP with 
unit Frechet margins on a 10 x 10 square grid, with 100 replicates per draw. 
An example of a single replicate is shown in Figure [4} This setup would 
correspond, for example, to 100 years of annual maximum temperature data 
from 100 weather stations. 

The unknown parameter in a 2-dimensional GEVP is a 2 x 2 covariance 
matrix. For this simulation, 9q = (En, S12, S22)' = (0.75,-0.5,1.25)'. The 
prior distribution for S is a vague inverse-Wishart. For each draw, a long 
MCMC chain is run and is computed as the posterior mean. In addition, 
for each draw, all four Q — P combinations of Qi,Qii,Pi, and Pboot) as 
defined above, are computed to produce four estimates of fl. Finally, the 



29 curvature-adjusted MCMC sampler from Cooley et al. (2012) is run on each 



simulated dataset, with fl estimated from Qn and Pi, evaluated at the 
maximum pairwise likelihood estimate of 

Figure [5] shows that, for this simulation, the OFS adjustment produces 
credible intervals that cover at almost exactly their nominal rates. Fur- 
thermore, OFS-adjusted credible intervals based on the four values of fl 
turned out nearly identical. This is about as good a result as one could 
hope for. The curvature-adjusted sampler also achieves nominal coverage. 
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The un-adjusted intervals systematically under-cover for each of the three 
parameters. This is expected (as noted by Cooley et al. (2012)), as the 



pairwise likelihood over-uses the data by including each location in roughly 
n/2 terms of the objective function rather than just one, as would be the 
case with a likelihood function. This results in a pairwise likelihood surface 
that is far too sharply-peaked relative to a likelihood surface. The OFS 
adjustment seems to successfully compensate for this effect. 



5 Data analysis 
5.1 Bird Counts 

Next, we apply tapered quasi-Bayesian analysis to a hierarchical model that 
includes a Gaussian process. Instead of a purely spatial random field as 



in Section 4.1 we assume a spatio-temporal random field, which highlights 
some of the advantages of tapering over other methods designed for large 
spatial datasets. 



- The dataset comes from a "citizen science" initiative called eBird (|www 



ebird. org). The idea of citizen science is that many non-professional ob- 
servers can be leveraged to collect an enormous amount of data. eBird 
participants across North Americal record the birds they see, along with 
the time and location of the observation, into a web-based database. Here, 
we look at 6114 observations of the Northern Cardinal in a section of the 
eastern United States over a period from 2004 to 2007 (Figure [T]). 

Inspection of the data suggests an overdispersed Poisson model. Let 
Yi, . . . ,Yn be observed counts and X = [x^^, . . . , x„] be a matrix of covariates 
associated with each observation. Also, let S = [s-^, . . . , s„] and T = [t-,^, . . . , t„] 
be the spatial and temporal locations, respectively, associated with Yi, . . . ,Yn, 
with space indexed by latitude and longitude. 

For this example, we deliberately chose a small number of predictors, 
several of which vary spatially. Preliminary analyses led to a set of 10 
covariates that includes time of day, day of year, human population den- 
sity, percentage of developed open space (single-family houses, parks, golf 
courses, etc.), tree canopy density, and variables that measure observer ef- 
fort. Simple transformations (logs, powers, etc.) were applied to some of 
the covariates, as suggested by ornithologists and exploratory analyses. 
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We specify the model as 



yi\\i ~ Pois(Ai) 

logAi(xi,Si,tj) = X-/3 + Zj(si,tj) + 

z(S,T)|6>* ~ Af(0,S*(6'*;S,T)) 

£i\T ~ iid iV(0,o-^). (10) 

We assume that the random effect z(S,T) has a Gaussian random field 
structure. Thus, this model is an example of "model-based geostatistics" 



of Diggle et al. (19981, a class of spatial generalized linear models that has 
seen wide application in the environmental literature. 

Even though Northern Cardinals are not migratory birds, a spatio- 
temporal structure for the random effect has a great intuitive appeal. One 
can easily imagine clusters of birds habiting different locales, moving from 
place to place based on things like food availability or disturbances. The 
correlation range of the spatio-temporal random effect informs the ornitholo- 
gists about the scales of movements of Northern Cardinals in space and time, 
as well as providing clues about what un-measured covariates are needed to 
explain the pattern of Northern Cardinal observations. 

The parameter e can be interpreted as either an overdispersion parame- 
ter, or as the traditional "nugget" effect, representing small-scale variation 
or measurement error. It will be convenient to marginalize over the random 
effects z and e and consider the distribution of the log-means directly. Fur- 
thermore, we win write the matrix S*(0*; S, T) + a^l simply as S, T) 
and condense 0* and al into the single parameter vector 6. The resulting 



model, equivalent to (10), is written as 



yi\Xi ~ Pois(Ai) 
logAi(xi,Sj,ti) = hi 

b(X,S,T)|0,/3 ~ iV(X/3,5](0;S,T)). (11) 

Another level in the hierarchy imposes a ridge penalty on the regression 
coefficients /3, specified as 

/3~iV(0,a|l). (12) 
Finally, we need priors for the parameters 6 and cj^ 
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15 
16 
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independently. 



For J^{6; S, T), we chose a spatio-temporal covariance model from Gneit' 



ing (2002). The covariance fmictions described therein are nonseparable in 



that (except in special cases) they cannot be written as the product of a 
purely spatial and purely temporal covariance function. Specifically, we let 



C(r;h,u) 



a 



(au2" + 1)2 



exp 



-(c/(72)h27 

(au2^Tl)^ 



(13) 



where h and u are distances between observation points in space and time, 
respectively. The parameters a G (0, 1] and 7 € (0, 1] control the smoothness 
of the process. We fixed these parameters at convenient values of 1 and .5, 
respectively, because they were not well-identified by the data. 

The parameter u S [0,1] has the nice interpretation of specifying the 
degree of nonseparability between purely spatial and purely temporal com- 
ponents; when a; = 0, C{6; h, u) is the product of a purely temporal and a 
purely spatial (exponential) covariance function. 

Priors for the parameters a^, a, c, o"^, and o"^ are specified as vague 
Cauchy distributions, truncated to have only positive support. The interac- 
tion parameter u is given a uniform prior on [0, 1]. 

A valid spatio-temporal taper matrix may be constructed as the element- 
wise product of a spatial and a temporal taper matrix 

T = T, o Tt. 

Constructed this way, T inherits the sparse entries of both and T^, and 
may therefore itself be extremely sparse. 

Several other methods exist to mitigate the computational burden im- 



22 posed by large spatial datasets with non-Gaussian responses. Wikle (2002) 



23 and Royle and Wikle ( 2005 ) embed a continuous spatial process into a latent 



grid and work in the spectral domain using fast Fourier methods. However, 
applying Fourier methods here is problematic, as it is not obvious how to 
do so for a process that has spatio-temporal structures. Low rank meth- 



ods like predictive processes (Banerjee et al. , 2008 Finley et al. , 2009) and 



28 fixed rank Kriging (Cressie and Johannesson, 2008) are also popular for 



spatial data. Applying these methods to spatio-temporal models is possi- 
ble, but awkward. For predictive processes, one must decide how to specify 
knot locations in space x time. For fixed rank Kriging, one must specify 
knot locations as well as space-time kernel functions. Fixed rank Kriging as 



33 been adapted to the spatio-temporal setting (Cressie et al. , 2010) through a 
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linear filtering framework, but only for Gaussian responses. Finally, Gauss- 
Markov approximations to continuous spatial processes are fast to compute, 



^speci ally when using Laplace approximations in place of MCMC ( Lindgren 



et al. 


2011 


Rue et al. 


2009 



to data with spatio-temporal random effects. 

In contrast, application of the tapering approach in the spatio-temporal 
context is immediate and even potentially enjoys increased computational 
efficiency relative to the purely spatial context because of the additional 
sparsity induced by element-wise multiplication with the temporal taper 
matrix. 

For the eBird data, a taper range of 20 miles and 60 days gives a tapered 
covariance matrix with about .5% nonzero elements. MCMC was carried out 
using a block Gibbs sampler. Each evaluation of the expensive normal log 
likelihood was replaced by its tapered analogue. Within each Gibbs itera- 
tion, each of b, 6, and cr| are updated with a random walk Metropolis step. 
The full conditional distribution for f3 is conditionally conjugate, enabling 
a simple update as a draw from the appropriate normal distribution. 

As described in Section 3.3, the tapered Gibbs sampler was run twice. 
Samples from the first run were used to produce point estimates of 6 and 
the marginal OFS adjustment matrix 0^1^,3 ^.2. The estimate ^01^,3 ^.2 
was computed from the asymptotic expressions for Q and Q evaluated at 
6qb, the quasi-posterior mean. Because the quasi-posterior distribution of 
interaction parameter uj was nearly uniform on [0,1], it was excluded from 
the adjustment. The conditional OFS was not attempted, as doing so would 
have required several months of computation time. 

After discarding 5000 burn-in iterations, 5000 MCMC samples were used 
for estimation and prediction. Pointwise quantiles of the posterior correla- 
tion surface are shown in Figure [6j for both the un-adjusted and adjusted 
samples. The point at which the correlation drops to .05, often called the 
"effective range" of a process, is the most extreme contour displayed in each 
of the plots in figure |6j While the two sets of contours do not differ much 
in the median, they are quite different in the upper and lower quantiles. 
For this analysis, the correlation structure is a key component with a use- 
ful interpretation, so its posterior uncertainty is of interest. Comparing the 
OFS-adjusted and un-adjusted correlation surfaces, it is interesting to note 
that OFS adjustment gives decreased temporal uncertainty but increased 
spatial uncertainty. 

The fairly long median effective range of around 225 days at spatial lag 
seemed reasonable to a panel of ornithologists, as Northern Cardinals, while 
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they do move around to some degree, are not migratory birds. The effective 
range of 3 miles at time lag seemed reasonable as well. Northern Cardinals 
build new nests each year and are socially monogamous within a breeding 
season, but divorces sometimes occur between years. They generally stay 
close to the nest to forage and bathe. Males are highly territorial and will 
occasionally challenge neighboring males' breeding territories. These behav- 
iors are consistent with a the posterior median temporal dependence range 
of a significant fraction of a single year, and a posterior median spatial range 
that is larger than but in the ballpark of an individual's territorial range. 

Posterior estimates for some of the more interesting fixed effects, along 
with 95% pointwise credible intervals, are plotted in Figure [7| The top 
right panel shows a clearly increasing trend as a function of the number 
of hours spent observing. The top right panel shows that the effect as a 
function of time of day increases until about 8 a.m. and then decreases until 
about 3 p.m., when it again begins to increase. The increase after 3 p.m. 
is accompanied by very wide credible intervals. In the bottom left panel, 
we can see an overwhelming negative effect at high elevations. Finally, the 
bottom right panel shows a seasonal cycle that peaks in early winter and 
attains its minimum in later summer. 

Recall that these fixed effects are on the log scale. Here again, a panel 
of ornithologists was pleased with the results. Obviously, the number of ob- 
served counts should increase with the amount of time an observer spends 
watching. The peak in the time of day effect at around 8 a.m. reflects the 
time of the highest activity level of the birds. The wide confidence bands 
starting at around 4 p.m. probably results from a lack of data in the after- 
noon. Northern Cardinals cannot live in habitats found at higher elevations, 
a fact reflected in the huge negative effect estimated after about 700 meters. 
Finally, cardinals tend to be easier to detect during the winter months be- 
cause they are more vocal, and they visit feeders more frequently. In the 
summer months, they tend to stay more hidden because it is their breeding 
season, and they do not visit feeders as often because food is more plentiful. 
These seasonal variations in detectability are reflected in the pattern shown 
in the estimated date effect. 

The median posterior predicted surface (Figure [s]) of the mean counts 
was generated by drawing from the posterior predictive distribution at a 
large set of sample points in the spatial domain, for flxed values of "effort" 
covariates, and at a flxed time. 

Maps like Figure |8j of course, vary in time as well as space. The most 
prominent feature of the predicted surface is the very low values along the 
Appalachians. This is a result of the huge elevation effect, and corroborates 
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1 expert knowledge. Another noticeable feature of the prediction surface is 

2 the elevated counts around population centers. It is well-known among or- 

3 nithologists that Northern Cardinals are most common in the suburbs. This 

4 happens for two reasons. The first is that they are attracted to the many 

5 bird feeders found in the suburbs. The second is that suburban habitat, 

6 with landscaped gardens and mixes of open areas, shrubs, and trees, is ideal 

7 habitat for cardinals. 



6 Discussion 
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The open-faced sandwich adjustment provides a way to incorporate estimat- 
ing functions that are not likelihoods into Bayesian-like models. While the 
resulting inference does not enjoy the elegant formal probabilistic interpre- 
tation of pure Bayesian analysis, it does inherit some of its most desirable 
attributes: borrowing strength across parameters, the ability to work with 
complicated hierarchical structures, and propagating uncertainty through- 
out model components, to name a few. When the likelihood function is 
unknown or has undesirable properties, the OFS adjustment allows one to 
retain these beneficial features of Bayesian analysis while avoiding the need 
to compute the likelihood function by substituting a suitable objective func- 
tion in its place. 

These benefits come with a few costs. First, the resultant MCMC sam- 
ples may not be interpreted as though they came from a true Bayesian model. 
Second, the coverage characteristics of the output are only as good as the 
applicability of the asymptotic approximation and the practitioner's ability 
to estimate the sandwich matrix, which can be a difficult task in some sit- 



uations (Kauermann and Carroll, 2001). Third, estimating the adjustment 



matrix using sample moments or a bootstrap and relying on it to produce 
posterior samples has a decidedly "un-Bayesian" feel to it. Finally, using the 
adjustment in the Gibbs sampler context does require approximately twice 
the computational effort as the un-adjusted Gibbs sampler, which in some 
cases can be considerable. 

In addition to these considerations, comparisons between the OFS ad- 



justment and the curvature adjustment of Cooley et al. (2012) seem natural. 



In our simulations, both adjustments performed extremely well. The cur- 
vature adjustment shares with OFS both the advantages and disadvantages 
described above. But in addition, the curvature adjustment, as implemented 



in the data example in Cooley et al. (2012), has several additional drawbacks 



to consider. First, Cooley et al. ( 2012 ) require an outside method to estimate 
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of 6 and ft, whereas the OFS adjustment uses the MCMC sample to esti- 
mate 6 and fl. Using the un-adjusted quasi-posterior sample to estimate 6 
and Q as we do here takes advantage of borrowing strength, leveraging prior 
information, etc. that simply maximizing ^Af(^) cannot. We note, however, 
that this drawback in the curvature adjustment can easily be avoided. One 



could easily apply the strategy that we suggest in Section 3.3, running the 
sampler first without adjustment to estimate and in a Bayesian-like 
way, and then using these estimates to implement the curvature-adjusted 
sampler. In the Metropolis context, this strategy requires twice the compu- 
tational effort of OFS; since OFS is applied to the sample post hoc, there 
is no need to run the sampler a second time. In the Gibbs sampler setting, 
however, the computational burden is identical. 

More obvious is the enormous computational cost imposed by estimating 
the conditional adjustment matrix in the adaptive version of the curvature 



adjusted Gibbs sampler favored by Cooley et al. (2012). This simply would 
not have been feasible, for example, in the eBird example of Section[5.1[ We 



note that this complication can probably be avoided by using the marginal, 
rather than the conditional, version of ft. In fact, since their simulated 
comparisons between the marginal and conditional forms of fl performed 



so similarly, we are confused as to why Cooley et al. (2012) use the much 



more computationally expensive conditional version in their data example. 
In the end, conditional on implementation details, the OFS and curvature 
adjustments are quite similar both in performance and in spirit. 
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Figure 3: Comparison of marginal and conditional OFS-adjusted quasi- 
posterior distributions. Panels (a) and (b) show qq-plots of the marginal 

quasi-posteriors of the two covariancc parameters. Panel (c) shows contour 
plots of a kernel density estimate of the joint quasi-posterior of the same 
parameters. 
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Figure 4: A realization of the Gaussian extreme value max-stable process of 



Smith (1990). 
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Figure 5: Empirical coverage rates for equi-tailed credible intervals based 
on MCMC samples using the pairwise likelihood. Colored curves are OFS- 
adjusted samples using different estimates of ft, and from a curvature- 
adjusted sampler. Dotted curves are un-adjusted samples. 
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Figure 6: Posterior quantiles of spatio-temporal correlation surface. The 
top row was computed from the un-adjusted sampler, and the bottom row 
was computed from the OFS-adjusted sampler. The median surfaces are 
similar, but the high and low quantile contours show that the uncertainty is 
noticeably altered by the adjustment. 




Figure 7: A subset of the estimated fixed effects. Solid lines are posterior 
medians, and shaded areas are posterior pointwise 90% credible intervals. 
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Figure 8: Median predicted surface for 8 a.m. on April 11. 
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